Abstract
Introduction The 2022 WHO Classification of Pediatric Tumors marked a major step forward in molecular diagnostics. DNA-methylation profiling has emerged as a core diagnostic tool and shown to solve complex CNS and sarcoma cases. In AML, epigenomic patterns also reflect diagnostic genomic variants, but their clinical use is still emerging. The Acute Leukemia Methylome Atlas (ALMA) set a benchmark by correctly classifying >90% of 27 WHO 2022 AML subtypes using a decision-tree-based machine-learning model, though ~8% of cases remained mislabeled (PMID: 40730747). Here, we describe a transformer-based classifier that further improves AL subtype classification, aiming to enhance diagnostic precision in long-read WGS and methylation arrays.
Methods DNA-methylation was profiled with Illumina 450K or EPIC arrays, covering 331556 CpG sites per sample. Processed data were obtained from the published ALMA dataset. For unsupervised learning, the cohort comprised of 3314 patient samples including AML, ALL, MDS, APL, MPAL and controls. For supervised learning, 2471 samples bearing WHO 2022 subtype labels entered stratified 5-fold cross-validation. Each methylome was compressed into a 64-dimensional latent vector by an autoencoder and then classified by a supervised transformer neural network; hyper-parameters were tuned within the training folds and locked before evaluation. Model performance was quantified with overall accuracy, sensitivity, specificity, weighted recall, macro and weighted F1-score. Generalizability was assessed on independent multi-center cohorts AML02 and AML08 trials (n=104), carrying confirmed WHO 2022 labels spanning six AML subtypes. Additionally, testing was performed in a long-read WGS cohort of 32 patient samples from UF Health Shands Hospital. Clinical truth for the UF cohort derived from cytogenetics, flow cytometry, and targeted NGS.
Results In 5-fold cross-validation, the classifier correctly predicted 2456 of 2471 methylomes, yielding 99.4% overall accuracy. Leukemia-versus-control discrimination was near perfect, with 99.9% sensitivity (2218/2220 malignant) and 98.8% specificity (248/251 controls). Subtype-specific sensitivity (weighted recall) reached 99.3%, macro F1=0.994, and the median per-subtype AUPRC was 0.995. Twenty subtypes had complete concordance. Accuracy stayed high (0.95–0.99) for nearly all others, except for AML t(8;16) KAT6A::CREBBP with 11/12 correct (0.92) and MPAL t(v;11q23.3)/KMT2A-r with 5/6 correct (0.83). Altogether, 15 errors (0.61%) occurred: ten intra-leukemia subtype swaps and five control-vs-leukemia miscalls. This represents a >10-fold reduction versus the original ALMA model, which recorded 10.4% errors (accuracy=0.896, weighted F1=0.927) under the same 5-fold CV. External testing on the independent AML02/08 cohort (n=104) delivered 96.2% accuracy (100/104 correct), weighted F1=0.974, and precision=0.990, despite class imbalance in the test set. In real-world clinical testing using long-read WGS (n=32), all translocations detected by conventional cytogenetics were captured (t(8;21) n=1, t(6;9) n=1, t(16;21) n=1, KMT2A-r n=7) despite variable sequencing coverage (range 1.1x-23.8x). Eight post-therapy marrow samples were labelled “Otherwise-Normal Control”, coherent with morphological remission. Four discordant cases comprised of two Trisomy 21 AML patients and two B-ALL cases predicted to be AML. Five matched BM/PB pairs were correct and concordant.
Conclusions By combining latent-space learning with a tabular transformer, the resulting classifier achieved 99% WHO-2022 concordance on >2400 leukemias and preserved ≥95 % accuracy in two multi-center external datasets, including low-coverage long-read WGS. The publicly available, open-source package (github.com/f-marchi/ALMA-classifier, v0.2.0) accepts both array and long-read WGS inputs, enabling rapid molecular classification in resource-limited settings or when time-critical diagnosis is needed. Ongoing work aims to expand the classifier capabilities to include lymphomas, Langerhans cell histiocytosis, trisomy-21 AML/ALL, chronic leukemias, multiple myeloma, and juvenile myelomonocytic leukemia.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal